You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we let MetricOutputHandler handle this then table will be interpreted as a list of unit metrics. We need to let TableOutputHandler take precedence and have unit metrics processed as a last resort.
How to test
What needs special review?
Dependencies, breaking changes, and deployment notes
Release notes
Checklist
What and why
Screenshots or videos (Frontend)
How to test
What needs special review
Dependencies, breaking changes, and deployment notes
This PR introduces several functional improvements focused on enhancing unit tests and output handling for time series data validation and metrics reporting. The changes include:
In the ADF (Augmented Dickey-Fuller) tests, the time series dataset size has been increased from 100 to 200 observations, providing more stable and statistically significant results. A random seed has been set for reproducibility, and the data now better differentiates between stationary and non-stationary series.
The test comparing stationary versus non-stationary series now validates that the ADF statistic of the stationary series is more negative than that of the non-stationary series, rather than relying solely on p-value comparisons. Additional checks ensure that both p-values lie within the valid range of 0 to 1.
Several new metric identifiers have been added to the tests for individual classification metrics (e.g., AbsoluteError, BrierScore, CalibrationError, among others), expanding the coverage of unit metrics considered during testing.
The order of output processing has been adjusted: the MetricOutputHandler has been moved to execute last to ensure that unit metric outputs are processed after other outputs. This reordering helps maintain better control over the flow and handling of test results.
Overall, these enhancements improve the robustness and reliability of the testing framework without altering core functionalities beyond the test suite and output processing middleware.
Test Suggestions
Run the entire test suite to ensure all new and existing tests pass with the updated dataset size and random seed.
Explicitly verify that the ADF statistics are more negative for stationary series compared to non-stationary ones in various scenarios.
Confirm that p-values are always between 0 and 1 for both series across multiple runs.
Test the output processing order to ensure that MetricOutputHandler executes last and that all output handlers function as expected.
Review the new metric identifiers to check that they integrate seamlessly with the overall metric evaluation process.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
internalNot to be externalized in the release notes
3 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description
What and why?
If we let
MetricOutputHandlerhandle this thentablewill be interpreted as a list of unit metrics. We need to letTableOutputHandlertake precedence and have unit metrics processed as a last resort.How to test
What needs special review?
Dependencies, breaking changes, and deployment notes
Release notes
Checklist